The Linked Data Mining Challenge 2014: Results and Experiences

نویسندگان

Vojtech Svátek

Jindrich Mynarz

Heiko Paulheim

چکیده

The 2014 edition of the Linked Data Mining Challenge, conducted in conjunction with Know@LOD 2014, has been the third edition of this challenge. The underlying data came from two domains: public procurement, and researcher collaboration. Like in the previous year, when the challenge was held at the Data Mining on Linked Data workshop co-located with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2013), the response to the challenge appeared lower than expected, with only one solution submitted for the predictive task this year. We have tried to track the reasons for the continuously low participation in the challenge via a questionnaire survey, and principles have been distilled that could help organizers of future similar challenges. 1 The Linked Data Mining Challenge Overview Linked data (LD) represents a novel type of data source that has been so far nearly untouched by advanced data mining methods. It breaks down many traditional assumptions on source data and thus represents a number of challenges: – While the individual published datasets typically follow a relatively regular, relational-like (or hierarchical, in the case of taxonomic classification) structure, the presence of semantic links among them makes the resulting ‘hyper-dataset’ akin to general graph datasets. On the other hand, compared to graphs such as social networks, there is a larger variety of link types in the graph. – The datasets have been published for entirely different purposes, such as statistical data publishing based on legal commitment of government bodies vs. publishing of encyclopedic data by internet volunteers vs. data sharing within a research community. This introduces further data modeling heterogeneity and uneven degree of completeness and reliability. – The amount and diversity of resources as well as their link sets is steadily growing, which allows for inclusion of new linked datasets into the mining dataset nearly on the fly, at the same time, however, making the feature selection problem extremely hard. The motivation for organizing the Linked Data Mining Challenge (LDMC) was twofold. First, it aimed to advertise the large quantities of linked data recently arising [1, 7] to a community which may have an interest in such diverse real-world datasets for testing machine learning and data mining systems and algorithms. Second, the data mining experience provided by challenge participants could foster an exchange on ideas and methods addressing the particularities of Linked Data mining. The call for challenge contributions was sent to several relevant mailing lists from the semantic web, data mining, as well as more general area (e.g., [email protected], [email protected], [email protected], DBworld). However, the response was unsatisfying throughout all three editions. In summary: – In 2012 there was no challenge result submission, and the workshop as such only attracted 1 submission and had to be canceled. – In 2013 there were 3 challenge result submissions [2, 4]. On the other hand, there were 5 regular paper submissions to the workshop, and, most notably, the workshop attracted a significant number of participants (over 40). – In 2014 there was only 1 challenge result submission [3]. There were 10 paper submissions and 25 participants registered to the workshop; however, this time the workshop itself was not primarily proposed as framing for the challenge, but was a continuation of a previously started series. Both 2013 and 2014 editions were used as a platform to discuss, with the participants, the problems and opportunities of such a challenge event. Furthermore, the 2014 edition was followed by a closed questionnaire survey (only targeting the registered Know@LOD’14 workshop participants) aiming at learning lessons from the LDMC organization endeavor. In this paper, we describe the challenge tasks, the datasets used, the process of data preparation, and, finally, the results of the questionnaire survey. 2 Tasks and Datasets The 2014 edition of the Linked Data Mining Challenge comprised three tasks, one predictive and two exploratory tasks. 2.1 Ordinal Prediction Task The ordinal prediction task was prepared for all three editions, and always related to the procurement domain. The target attribute to be predicted was the number of tenders for the respective public contract; the true value of this target attribute was not known for the evaluation dataset before the bidding period has been closed (which was after the result submission deadline). The principal evaluation measure at the level of individual object has been the absolute value of the difference between the predicted value v̄ and the reference value v, adjusted by the reciprocal value of the (smaller, except zero) value size and normalized to [0, 1] by a sigmoidal function:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Employing data mining to explore association rules in drug addicts

Drug addiction is a major social, economic, and hygienic challenge that impacts on all the community and needs serious threat. Available treatments are successful only in short-term unless underlying reasons making individuals prone to the phenomenon are not investigated. Nowadays, there are some treatment centers which have comprehensive information about addicted people. Therefore, given the ...

متن کامل

The Linked Data Mining Challenge 2015

The 2015 edition of the Linked Data Mining Challenge, conducted in conjunction with Know@LOD 2015, has been the third edition of this challenge. This year’s dataset collected movie ratings, where the task was to classify well and badly rated movies. The solutions submitted reached an accuracy of almost 95%, which is a clear advancement over the baseline of 60%. However, there is still headroom ...

متن کامل

Not-So-Linked Solution to the Linked Data Mining Challenge 2016

We present a solution for the Linked Data Mining Challenge 2016, that achieved 92.5% accuracy according to the submission system. The solution uses a hand-crafted dataset, that was created by scraping various websites for reviews. We use logistic regression to learn a classification model and we publish all our results to GitHub.

متن کامل

Graph Kernels for Task 1 and 2 of the Linked Data Data Mining Challenge 2013

In this paper we present the application of two RDF graph kernels to task 1 and 2 of the linked data data-mining challenge. Both graph kernels use term vectors to handle RDF literals. Based on experiments with the task data, we use the Weisfeiler-Lehman RDF graph kernel for task 1 and the intersection path tree kernel for task 2 in our final classifiers for the challenge. Applying these graph k...

متن کامل

Linked Data Mining Challenge (LDMC) 2013 Summary

The paper summarizes the conception, data preparation and result evaluation of the LDMC, which has been organized in connection with the DMoLD’13 Data Mining on Linked Data Workshop, Prague, September 23 (as part of the ECML/PKDD conference program).

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

The Linked Data Mining Challenge 2014: Results and Experiences

نویسندگان

چکیده

منابع مشابه

Employing data mining to explore association rules in drug addicts

The Linked Data Mining Challenge 2015

Not-So-Linked Solution to the Linked Data Mining Challenge 2016

Graph Kernels for Task 1 and 2 of the Linked Data Data Mining Challenge 2013

Linked Data Mining Challenge (LDMC) 2013 Summary

عنوان ژورنال:

اشتراک گذاری